Take home exercise 1

Author

yang yayong

Published

September 2, 2024

Modified

September 29, 2024

Geospatial Analytics for Social Good:

Thailand Road Accident Case Study

1 The scene

Overview According to the World Health Organization (WHO), road traffic accidents cause approximately 1.19 million deaths annually, with 20-50 million non-fatal injuries. Vulnerable road users, such as pedestrians, cyclists, and motorcyclists, make up more than half of all deaths. Road traffic injuries are the leading cause of death for children and young adults aged 5–29, with two-thirds of fatalities occurring among working-age individuals (18–59 years). Low- and middle-income countries, which own 60% of the world’s vehicles, account for 90% of road fatalities. Road accidents also impose significant economic burdens, costing countries 3% of their annual GDP.

Thailand has the highest road death rate in Southeast Asia, with around 20,000 deaths annually. From 2014 to 2021, 19% of accidents occurred on national highways, and there was a 66% chance of encountering accident-prone zones (“black spots”), with 66% on straight roads, 13% at curves, and smaller percentages at intersections, bridges, and slopes.

1.1 Objectives

we are tasked to discover factors affecting road traffic accidents in the Bangkok Metropolitan Region BMR by employing both spatial spatio-temporal point patterns analysis methods.

The specific objectives of this take-home exercise are as follows:

  • To visualize the spatio-temporal dynamics of road traffic accidents in BMR using appropriate statistical graphics and geovisualization methods.

  • To conduct detailed spatial analysis of road traffic accidents using appropriate Network Spatial Point Patterns Analysis methods.

  • To conduct detailed spatio-temporal analysis of road traffic accidents using appropriate Temporal Network Spatial Point Patterns Analysis methods.

1.3 The Data

For the purpose of this exercise, three basic data sets are needed, they are:

  • Thailand Road Accident [2019-2022] on Kaggle,a point feature layer in CSV format: The dataset contains point features representing road accidents in Thailand from 2019 to 2022. Each feature represents a unique accident event, with spatial attributes (latitude and longitude) and other associated accident data, such as the date, time, severity, and road type.

  • Thailand Roads (OpenStreetMap Export) on HDX,a line feature layer in ESRI Shapefile format: The dataset contains line features representing the road network of Thailand. Each feature represents a section of a road, with spatial attributes and additional road-related information.

  • Thailand - Subnational Administrative Boundaries on HDX,a polygon feature layer in ESRI Shapefile format.

2 Installing and Loading the R packages

pacman::p_load(sf, raster, spatstat,sparr,tmap,tidyverse,spNetwork,lubridate,gridExtra,readr,reshape2,viridis,gifski,classInt,knitr)

Explanations for the imported library:

  • sf: For handling geospatial data and geometries.

  • raster: Deals with raster data and geographic grid layers.

  • spatstat: Spatial data analysis and point pattern analysis.

  • sparr: Performs spatial relative risk estimation and kernel density estimation.

  • tmap: For interactive and static maps.

  • tidyverse: For data manipulation and visualization.

  • spNetwork: Network-based spatial analysis.

  • lubridate: Simplifies working with dates and times.

  • gridExtra: For arranging and combining plots.

  • readr: Fast reading of rectangular data.

  • reshape2: For reshaping data.

  • viridis: Provides color maps for data visualization.

  • gifski: For GIF creation.

  • classInt: For interval classification.

  • knitr: Creates dynamic reports in R

2.1 Spatial Data Wrangling

2.1.1 Importing & Converting an Aspatial Data

We first select the Bangkok Metropolitan Region (BMR).

selected_provinces <- c("Bangkok", "Nonthaburi", "Pathum Thani", "Samut Prakan", "Nakhon Pathom", "Samut Sakhon","knitr")

2.1.2 Data preprocessing and transformation

acci <- read_csv("data/rawdata/thai_road_accident_2019_2022.csv") %>%
  mutate(Year = year(incident_datetime))  %>%
  mutate(Month_num = month(incident_datetime)) %>%
  mutate(Month_fac = month(incident_datetime, label = TRUE, abbr = TRUE)) %>%
  mutate(dayofweek = wday(incident_datetime, label = TRUE)) %>%
  mutate(dayofyear = yday(incident_datetime)) %>%  
  mutate(hour = hour(incident_datetime)) %>% 
  mutate(Month = floor_date(incident_datetime, "month")) %>%
  filter(!is.na(longitude) & longitude != "", 
         !is.na(latitude) & latitude != "",
         province_en %in% selected_provinces) %>%  
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
  st_transform(crs = 32647)  %>%
  mutate(Time = as.numeric(difftime(incident_datetime, min(incident_datetime), units = "days")))
Rows: 81735 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (10): province_th, province_en, agency, route, vehicle_type, presumed_c...
dbl   (6): acc_code, number_of_vehicles_involved, number_of_fatalities, numb...
dttm  (2): incident_datetime, report_datetime

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Following steps:

  1. Load Data: Reads a CSV file containing accident spatial data.

  2. Date-Time Operations: Extracts year, month, day, and hour from the accident’s date and time, as well as adds new columns for day of the week and day of the year.

  3. Filter Data: Removes records with missing or invalid latitude/longitude and limits data to specific provinces.

  4. Spatial Transformation: Converts the accident points into spatial objects with proper CRS and transforms the coordinates.

  5. Add Time Variable: Computes the time difference from the first incident in days.

write_rds(acci,"data/rds/acci.rds")

acci <- read_rds("data/rds/acci.rds")
glimpse(acci)
Rows: 12,986
Columns: 25
$ acc_code                    <dbl> 571882, 600001, 605043, 629691, 571887, 59…
$ incident_datetime           <dttm> 2019-01-01 02:25:00, 2019-01-01 03:00:00,…
$ report_datetime             <dttm> 2019-01-02 17:32:00, 2019-01-05 10:33:00,…
$ province_th                 <chr> "นครปฐม", "นนทบุรี", "สมุทรปราการ", "กรุงเทพมห…
$ province_en                 <chr> "Nakhon Pathom", "Nonthaburi", "Samut Prak…
$ agency                      <chr> "department of rural roads", "department o…
$ route                       <chr> "แยกทางหลวงหมายเลข 4 (กม.ที่ 51+920) - บ้านวัด…
$ vehicle_type                <chr> "motorcycle", "private/passenger car", "pr…
$ presumed_cause              <chr> "speeding", "speeding", "running red light…
$ accident_type               <chr> "rollover/fallen on straight road", "rollo…
$ number_of_vehicles_involved <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 3, 2, …
$ number_of_fatalities        <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, …
$ number_of_injuries          <dbl> 2, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, …
$ weather_condition           <chr> "clear", "clear", "clear", "clear", "clear…
$ road_description            <chr> "straight road", "straight road", "other",…
$ slope_description           <chr> "no slope", "no slope", "other", "other", …
$ Year                        <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, …
$ Month_num                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ Month_fac                   <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Ja…
$ dayofweek                   <ord> Tue, Tue, Tue, Tue, Tue, Tue, Tue, Tue, Tu…
$ dayofyear                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ hour                        <int> 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 9, 10, 11, 1…
$ Month                       <dttm> 2019-01-01, 2019-01-01, 2019-01-01, 2019-…
$ geometry                    <POINT [m]> POINT (627012.3 1533381), POINT (655…
$ Time                        <dbl> 0.00000000, 0.02430556, 0.02430556, 0.0277…

Notice: Table above shows the content of acci, a new column called geometry has been added into the data frame. On the other hand, the longitude and latitude columns have been dropped from the data frame.

2.1.3 Mapping the spatial data set

tmap_mode('view')
tmap mode set to interactive viewing
tm_shape(acci)+
  tm_dots(col = 'red')
tmap_mode('plot')
tmap mode set to plotting

2.2 Geospatial Data Wrangling

2.2.1 Importing the geospatial data

road <- st_read(dsn = "data/rawdata", layer = "hotosm_tha_roads_lines_shp") 

From the data information, we can notice that it has no CRS so we will transform it in the later analysis

write_rds(road,"data/rds/road.rds")
road <- read_rds("data/rds/road.rds")
glimpse(road)
Rows: 2,792,590
Columns: 15
$ name       <chr> "ถนนฉลองกรุง", "ซอยฉลองกรุง 1/1", NA, NA, "ถนนฉลองกรุง", NA, "…
$ name_en    <chr> "Chalong Krung Road", "Soi Chalong Krung 1/1", NA, NA, "Cha…
$ highway    <chr> "secondary", "residential", "secondary_link", "service", "s…
$ surface    <chr> "paved", NA, NA, NA, "concrete", NA, NA, "unpaved", NA, NA,…
$ smoothness <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ width      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ lanes      <chr> NA, NA, NA, NA, "2", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ oneway     <chr> "yes", NA, "yes", NA, "yes", NA, NA, NA, NA, NA, NA, NA, NA…
$ bridge     <chr> NA, NA, NA, NA, "yes", NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ layer      <chr> NA, NA, NA, NA, "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ source     <chr> NA, NA, NA, NA, "Bing", NA, NA, "GPS", NA, NA, NA, NA, NA, …
$ name_th    <chr> "ถนนฉลองกรุง", "ซอยฉลองกรุง 1/1", NA, NA, "ถนนฉลองกรุง", NA, "…
$ osm_id     <dbl> 1125681229, 594401607, 472283206, 594401608, 116847248, 317…
$ osm_type   <chr> "ways_line", "ways_line", "ways_line", "ways_line", "ways_l…
$ geometry   <MULTILINESTRING> MULTILINESTRING ((100.7913 ..., MULTILINESTRING…
2.2.1.1 Select target car accidence road
unique(road$highway)
 [1] "secondary"      "residential"    "secondary_link" "service"       
 [5] "tertiary"       "path"           "footway"        "track"         
 [9] "unclassified"   "trunk"          "trunk_link"     "primary"       
[13] "primary_link"   "steps"          "motorway_link"  "cycleway"      
[17] "pedestrian"     "tertiary_link"  "motorway"       "construction"  
[21] "road"           "raceway"        "corridor"       "living_street" 
[25] "escape"         "proposed"       "busway"         "bridleway"     
[29] "abandoned"      "parth"          "barrier"        "paved"         
vehicle_type_counts <- acci %>%
  count(vehicle_type, sort = TRUE)
ggplot(vehicle_type_counts, aes(x = reorder(vehicle_type, n), y = n)) +
  geom_bar(stat = "identity") +
  coord_flip() +  
  labs(x = "Vehicle Type", y = "Count", title = "Number of Accidents by Vehicle Type")

From the bar chart,we notice that the car,truck and the motorcycle take up the most of all the car accidence,so in this case,we mainly focus on these three types of car accidence.

selected_roads <- road %>%
  filter(highway %in% c("motorway", "trunk", "primary","secondary"))
2.2.1.2 Visualising the Geospatial Data
selected_roads3 <- road %>%
  filter(highway %in% c("motorway", "trunk", "primary"))

2.2.2 Importing the boundary geospatial data

admin_level_1 <- st_read(dsn = "data/rawdata", 
                         layer = "tha_admbnda_adm1_rtsd_20220121")
Reading layer `tha_admbnda_adm1_rtsd_20220121' from data source 
  `/Users/yangyayong/Downloads/学校文件/smu文件/Term 3/G/yyyirene/ISSS626-GAA/Take_home_Ex/Take_home_Ex01/data/rawdata' 
  using driver `ESRI Shapefile'
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
admin_level_2 <- st_read(dsn = "data/rawdata", 
                         layer = "tha_admbnda_adm2_rtsd_20220121")
Reading layer `tha_admbnda_adm2_rtsd_20220121' from data source 
  `/Users/yangyayong/Downloads/学校文件/smu文件/Term 3/G/yyyirene/ISSS626-GAA/Take_home_Ex/Take_home_Ex01/data/rawdata' 
  using driver `ESRI Shapefile'
Simple feature collection with 928 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84

2.2.2.1 Select the boundary of BMR

admin_level_1_filtered <- admin_level_1 %>%
  filter(ADM1_EN %in% selected_provinces)

admin_level_2_filtered <- admin_level_2 %>%
  filter(ADM1_EN %in% selected_provinces)
write_rds(admin_level_1_filtered,"data/rds/admin_level_1_filtered.rds")
write_rds(admin_level_2_filtered,"data/rds/admin_level_2_filtered.rds")
admin_level_1_filtered <- read_rds("data/rds/admin_level_1_filtered.rds")
admin_level_2_filtered <- read_rds("data/rds/admin_level_2_filtered.rds")

2.2.2.2 Visualising the Geospatial Data

ggplot() +
  geom_sf(data = admin_level_1_filtered, color = "black", size = 0.5)

2.2.2.3 Setting the Coordinate Reference System, CRS

In order to intersect with the boundary data ,we should let them in the same CRS.

selected_roads <- st_set_crs(selected_roads, 4326)

2.2.2.4 Spatial Intersection

road_i <- st_intersection(selected_roads, admin_level_1_filtered)
saveRDS(road_i,"data/rds/road_i.rds")
road_i <- readRDS("data/rds/road_i.rds")

2.2.2.6 Projection Transformation

We need to transform the spatial data into the Thailand National Coordinate System (EPSG: 32647, UTM zone 47N)

road_i_t <- st_transform(road_i, crs = 32647)
admin_level_1_filtered_32647 <- st_transform(admin_level_1_filtered, crs = 32647)

admin_level_2_filtered_32647 <- st_transform(admin_level_2_filtered, crs = 32647)

3. Network Spatial Point Patterns Analysis

3.1 Visualising the Geospatial Data

tm_shape(road_i_t) +  
  tm_lines("black") + 
  tm_shape(acci) + 
  tm_dots("red", size = 0.002)

3.2 Visualising Accident Density in BMR

ggplot() +
  stat_density_2d(data = acci, aes(x = st_coordinates(acci)[,1], y = st_coordinates(acci)[,2], fill = ..level..), geom = "polygon", color = "red") +
  geom_sf(data = admin_level_1_filtered_32647, fill = "NA") +
  scale_fill_viridis_c() +
  labs(title = "Heatmap of Accident Density in BMR") +
  theme_minimal()
Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(level)` instead.

We can indeed see that the areas bordering other provinces have a relatively high concentration of traffic accidents, especially in the southeastern and boundary areas of the Bangkok Metropolitan Region (BMR).

3.4 Select target areas

3.4.1 Convert the road data to LINESTRING

road_i_lines <- st_cast(road_i_t, "LINESTRING", do_split = TRUE)
Warning in st_cast.MULTILINESTRING(X[[i]], ...): keeping first linestring only
Warning in st_cast.MULTILINESTRING(X[[i]], ...): keeping first linestring only
Warning in st_cast.MULTILINESTRING(X[[i]], ...): keeping first linestring only
Warning in st_cast.MULTILINESTRING(X[[i]], ...): keeping first linestring only
Warning in st_cast.MULTILINESTRING(X[[i]], ...): keeping first linestring only
Bangkok <- admin_level_1_filtered_32647 %>%
  filter(ADM1_EN == "Bangkok") 
Bangkok_b <- admin_level_2_filtered_32647 %>%
  dplyr::filter(ADM1_EN == "Bangkok")  %>%
  filter(ADM2_EN=="Lat Krabang" | ADM2_EN=="Prawet") 
acci_b <- acci %>%
  dplyr::filter(province_en == "Bangkok")
road_i_t_filtered2 <- road_i_lines %>%
  filter(highway %in% c("motorway"))
plot(road_i_t_filtered2)
Warning: plotting the first 9 out of 30 attributes; use max.plot = 30 to plot
all

road_b <- st_intersection(Bangkok_b,road_i_t_filtered2)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
road_b <- st_cast(road_b, "LINESTRING")

3.5 Preparing the lixels objects

lixels <- lixelize_lines(road_b, 
                         500, 
                         mindist = 250)

3.5.1 Generating line centre points

samples <- lines_center(lixels) 

3.5.2 Performing NKDE

densities <- nkde(road_b, 
                  events = acci_b,
                  w = rep(1, nrow(acci_b)),
                  samples = samples,
                  kernel_name = "quartic",
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(5,5), 
                  max_depth = 8,
                  agg = 5, 
                  sparse = TRUE,
                  verbose = FALSE)
saveRDS(densities, "data/rds/densities.rds")
densities <- readRDS("data/rds/densities.rds")
samples$density <- densities
lixels$density <- densities
# rescaling to help the mapping
samples$density <- samples$density*1000
lixels$density <- lixels$density*1000

3.5.3 Visualising target areas NKDE

tmap_mode("plot")
tmap mode set to plotting
tm_shape(Bangkok_b) + 
  tm_borders(col = "black", lwd = 1) +  
  tm_text("ADM2_EN", size = 1, col = "red", shadow = TRUE) +
tm_shape(lixels) + 
  tm_dots(col = "density", style = "kmeans", n = 8, palette = "viridis", size = 0.02) + 
  tm_layout(legend.outside = TRUE)

From the plot we found that the area of “Lat Krabang” and Prawet” show higher density than other area of the Bangkok,so we decide to find more insight of the accidence reason.

we also found that both of them happen in the boundary between “Lat Krabang” and Prawet” as well as the area near the airport.

3.5.4 Computing K-fucntion Estimate

Lat_b <- admin_level_2_filtered_32647 %>%
  filter(ADM2_EN %in% c("Lat Krabang"))
acci_ppp <- as.ppp(acci)
Warning in as.ppp.sf(acci): only first attribute column is used for marks
Lat_b_owin <- as.owin(Lat_b)

acci_tm_ppp <- acci_ppp[Lat_b_owin]

acci_tm_ppp.km <- rescale(acci_tm_ppp, 1000, "km")
K_lp = Kest(acci_tm_ppp, correction = "Ripley")
saveRDS(K_lp, "data/rds/K_lp.rds")
K_lp <- readRDS("data/rds/K_lp.rds")
plot(K_lp, . -r ~ r, ylab= "K(d)-r", xlab = "d(m)")

The black solid line is significantly higher than the red dashed line, especially at larger distances, which indicates that there is obvious spatial aggregation of accident points in this area.

To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:

Ho = The distribution of accidence at “Lat Krabang” and “Prawet” are randomly distributed.

H1= The distribution of accidence at “Lat Krabang” and “Prawet” are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

The code chunk below is used to perform the hypothesis testing.

K_lp.csr <- envelope(acci_tm_ppp, Kest, nsim = 5, rank = 1, glocal=TRUE)
saveRDS(K_lp.csr, "data/rds/K_lp.csr.rds")
K_lp.csr <- readRDS("data/rds/K_lp.csr.rds")
plot(K_lp.csr, . - r ~ r, xlab="d", ylab="K(d)-r")

The second plot adds confidence intervals and further confirms the spatial clustering by comparing the observed values ​​with the theoretical model. Therefore, the null hypothesis can be rejected, indicating that the distribution of accidents in the “Lat Krabang” and “Prawet” areas are not randomly distributed.

This means that certain geographical or road features (e.g. airport sections, traffic density, inadequate traffic control measures, regional boundaries, etc.) may lead to a high frequency of accidents.

3.6 Finding the highest density

max_density_index <- which.max(lixels$density)
max_density_value <- lixels$density[max_density_index]
max_density_segment <- lixels[max_density_index, ]
max_density_segment
Simple feature collection with 1 feature and 51 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 679364 ymin: 1519310 xmax: 679394.6 ymax: 1519315
Projected CRS: WGS 84 / UTM zone 47N
   lineID Shape_Leng  Shape_Area ADM2_EN ADM2_TH ADM2_PCODE ADM2_REF ADM2ALT1EN
82     82  0.2907119 0.004503014  Prawet  ประเวศ     TH1032     <NA>       <NA>
   ADM2ALT2EN ADM2ALT1TH ADM2ALT2TH ADM1_EN      ADM1_TH ADM1_PCODE  ADM0_EN
82       <NA>       <NA>       <NA> Bangkok กรุงเทพมหานคร       TH10 Thailand
     ADM0_TH ADM0_PCODE       date    validOn validTo            name
82 ประเทศไทย         TH 2019-02-18 2022-01-22    <NA> ถนนกรุงเทพฯ-ชลบุรี
                     name_en  highway surface smoothness width lanes oneway
82 Bangkok-Chonburi Motorway motorway    <NA>       <NA>  <NA>     4    yes
   bridge layer source         name_th   osm_id  osm_type Shape_Leng.1
82    yes     1   <NA> ถนนกรุงเทพฯ-ชลบุรี 28495533 ways_line     2.417227
   Shape_Area.1 ADM1_EN.1    ADM1_TH.1 ADM1_PCODE.1 ADM1_REF ADM1ALT1EN
82    0.1313387   Bangkok กรุงเทพมหานคร         TH10     <NA>       <NA>
   ADM1ALT2EN ADM1ALT1TH ADM1ALT2TH ADM0_EN.1 ADM0_TH.1 ADM0_PCODE.1     date.1
82       <NA>       <NA>       <NA>  Thailand ประเทศไทย           TH 2019-02-18
    validOn.1 validTo.1                       geometry  density
82 2022-01-22      <NA> LINESTRING (679394.6 151931... 2.314315

From the information provided,we can notice that the highest density area happened in the Prawet.

3.6.1 Visualising the highest density

p_h <- admin_level_2_filtered_32647 %>%
  filter( ADM2_EN == "Prawet")
plot(p_h)
Warning: plotting the first 9 out of 19 attributes; use max.plot = 19 to plot
all

road_i_t_filtered3 <- road_i_lines %>%
  filter(highway %in% c("motorway"))
road_hd <- st_intersection(p_h,road_i_t_filtered3)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
plot(road_hd)
Warning: plotting the first 10 out of 49 attributes; use max.plot = 49 to plot
all

acci_high1 <- acci %>%
  filter(province_en == "Bangkok" & agency == "department of highways" & vehicle_type == "private/passenger car")
road_buffer <- st_buffer(road_hd, dist = 50)

acci_road_intersection <- st_intersection(acci_high1, road_buffer)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
tmap_mode("plot")
tmap mode set to plotting
tm_shape(road_hd) +
  tm_lines() +
  tm_shape(acci_road_intersection) +
  tm_dots(col = "red", size = 0.05) 

We found that the high density areas are happening from airport area of the road junction.However,we did not consider of the time aspect,so in the next section,we will find more insight with the time factor.

4 Temporal-Spatial Point Patterns Analysis

4.1 Visualising Daily Accidents in BMR

acci_high1 %>%
  mutate(date = as.Date(incident_datetime)) %>%
  group_by(date) %>%
  summarize(accidents = n()) %>%
  ggplot(aes(x = date, y = accidents)) +
  geom_line(col = "steelblue") +
  labs(title = "Daily Accidents in BMR", x = "Date", y = "Number of Accidents") +
  theme_minimal()

ggplot(acci_high1, aes(x = incident_datetime)) +
  geom_histogram(binwidth = 30 * 86400,fill = "steelblue") +  
  labs(title = "Accident Trend over Time (2019-2022)", x = "Date", y = "Number of Accidents") +
  theme_minimal()

ggplot(acci_high1, aes(x = year(incident_datetime))) +
  geom_histogram(binwidth = 1,fill = "steelblue") + 
  labs(title = "Accident Trend by Year", x = "Year", y = "Number of Accidents") +
  theme_minimal()

ggplot(acci_high1, aes(x = hour)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +  
  scale_x_continuous(breaks = 0:23) + 
  labs(x = "Hour of the Day", y = "Number of Accidents", title = "Accidents Distribution by Hour (0-24)") +
  theme_minimal()

From the four plots,we found that Before the COVID-19 outbreak, there were many road traffic accidents before 2020,and after recovering COVID in 2022.Also ,most of the car accidence happened in the well peek time aound 9 am and 19 pm.

accident_counts <- acci_high1 %>%
  count(Month, dayofweek)

plot_heatmap <- ggplot(accident_counts, aes(x = Month, y = dayofweek, fill = n)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(option = "C") +
  labs(x = "Month", y = "Day of Week", fill = "Accident Count", title = "2019 Accident Heatmap") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  

plot(plot_heatmap)

Here we can firstly notice that the accidence happened mostly at the end of the years and the weekend as well.

Next,we will focus most in the season aspect.

4.2 Visualising Temporal-Spatial Dynamics of Road Accidents in BMR

library(gganimate)

Attaching package: 'gganimate'
The following object is masked from 'package:raster':

    animate
acci_2019 <- acci_high1 %>%
  filter(Year == 2019 &  Month_num == 4)  
p <- ggplot() +
  geom_sf(data = admin_level_1_filtered_32647, fill = "gray90") + 
  geom_point(data = acci_2019, aes(x = st_coordinates(acci_2019)[,1], y = st_coordinates(acci_2019)[,2], color = province_en), size = 3) +
  labs(title = "Spatio-Temporal Dynamics of Road Accidents in BMR (2019)",
       subtitle = 'Date: {frame_time}') +
  transition_time(acci_2019$incident_datetime) +  
  ease_aes('linear') + 
  theme_minimal()
anim <- animate(p, duration = 80, fps = 5)  
anim_save("road_accidents_bmr_2019.gif", animation = anim)
knitr::include_graphics("road_accidents_bmr_2019.gif")

As time goes by, we also find that accidents mostly occur at the border.

4.3 Visuaising geographic distribution of accidence by month

p_h_owin <- as.owin(p_h)
p_h_owin
window: polygonal boundary
enclosing rectangle: [676715.8, 685182.8] x [1510361.4, 1519352.4] units
class(p_h_owin)
[1] "owin"
tmap_mode("plot")
tmap mode set to plotting
tm_shape(p_h)+
  tm_polygons() +
tm_shape(acci) +
  tm_dots(size = 0.1) +
tm_facets(by="Month_fac", 
            free.coords=FALSE, 
            drop.units = TRUE)

The plot indicates that January, April, and December show a higher concentration of road accidents in the Bangkok Metropolitan Region (BMR) compared to other months. This suggests that these months may experience higher traffic incidents, potentially due to specific factors like holiday seasons, or increased travel during these times. Further analysis may be needed to explore the causes behind this trend and whether interventions during these months could help reduce the number of accidents.

4.3.1 Computing STKDE by Month

acci_month <- acci_high1 %>%
  dplyr::select(Month_num)
acci_month_ppp <- as.ppp(acci_month)
acci_month_ppp
Marked planar point pattern: 2408 points
marks are numeric, of storage type  'double'
window: rectangle = [644139, 706435.5] x [1505013.5, 1542620.4] units
any(duplicated(acci_month_ppp))
[1] TRUE
acci_month_ppp_jit <- rjitter(acci_month_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE) 
any(duplicated(acci_month_ppp_jit))
[1] FALSE
acci_month_owin <- acci_month_ppp_jit[p_h_owin]
summary(acci_month_owin)
Marked planar point pattern:  212 points
Average intensity 3.933767e-06 points per square unit

Coordinates are given to 10 decimal places

marks are numeric, of type 'double'
Summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   3.000   7.000   6.764  10.000  12.000 

Window: polygonal boundary
single connected closed polygon with 973 vertices
enclosing rectangle: [676715.8, 685182.8] x [1510361.4, 1519352.4] units
                     (8467 x 8991 units)
Window area = 53892400 square units
Fraction of frame area: 0.708
st_kde <- spattemp.density(acci_month_owin)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
summary(st_kde)
Spatiotemporal Kernel Density Estimate

Bandwidths
  h = 680.4161 (spatial)
  lambda = 0.7732 (temporal)

No. of observations
  212 

Spatial bound
  Type: polygonal
  2D enclosure: [676715.8, 685182.8] x [1510361, 1519352]

Temporal bound
  [1, 12]

Evaluation
  128 x 128 x 12 trivariate lattice
  Density range: [1.36116e-28, 3.825176e-08]
tims <- c(1,2,3,4,5,6,7,8,9,10,11,12)
par(mfcol=c(2,3))
for(i in tims){ 
  plot(st_kde, i, 
       override.par=FALSE, 
       fix.range=TRUE, 
       main=paste("KDE at month",i))
}

The accident hotspots are more densely distributed in Jan and Dec , especially in the southern and central regions, indicating that traffic accidents in different months have obvious spatial and temporal dynamics. .

start <- as.POSIXct("2019/01/01", format = "%Y/%m/%d")
acci_high1$Time <- difftime(acci_high1$incident_datetime, start, units = "days")
acci_high1$Time <- as.numeric(acci_high1$Time)
months <- as.character(1:12)
months <- ifelse(nchar(months) == 1, paste0("0", months), months)
months_starts_labs <- paste("2019/", months, "/01", sep = "")
months_starts_num <- as.POSIXct(months_starts_labs, format = "%Y/%m/%d")
months_starts_num <- difftime(months_starts_num, start, units = "days")
months_starts_num <- as.numeric(months_starts_num)
months_starts_labs <- gsub("2019/", "", months_starts_labs, fixed = TRUE)

The time difference from January 1, 2019 is used to calculate and mark the relative time (in days) when the accident occurred. This operation prepares for subsequent time series analysis or visualization, so that the accident data can be displayed on a timeline.

4.4 Spatial point pattern analysis by hours

We will perform analysis on traffic accident data (acci) ,especially operating on the hour when the accident occurred (hour)

acci_hour <- acci_high1 %>% 
  dplyr::select(hour)
acci_hour_ppp <- as.ppp(acci_hour)
acci_hour_ppp
Marked planar point pattern: 2408 points
marks are numeric, of storage type  'integer'
window: rectangle = [644139, 706435.5] x [1505013.5, 1542620.4] units
summary(acci_hour_ppp)
Marked planar point pattern:  2408 points
Average intensity 1.027839e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 10 decimal places

marks are numeric, of type 'integer'
Summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    8.00   13.00   12.72   18.00   23.00 

Window: rectangle = [644139, 706435.5] x [1505013.5, 1542620.4] units
                    (62300 x 37610 units)
Window area = 2342780000 square units
any(duplicated(acci_hour_ppp))
[1] TRUE

We found that there is duplicate from the data set ,so next we conduct the method to delete them.

acci_hour_ppp_jit <- rjitter(acci_hour_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE) 
any(duplicated(acci_hour_ppp_jit))
[1] FALSE
any(duplicated(acci))
[1] FALSE
acci_hour_owin <- acci_hour_ppp[p_h_owin]
summary(acci_hour_owin)
Marked planar point pattern:  234 points
Average intensity 4.341988e-06 points per square unit

Coordinates are given to 10 decimal places

marks are numeric, of type 'integer'
Summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    7.25   13.00   12.33   19.00   23.00 

Window: polygonal boundary
single connected closed polygon with 973 vertices
enclosing rectangle: [676715.8, 685182.8] x [1510361.4, 1519352.4] units
                     (8467 x 8991 units)
Window area = 53892400 square units
Fraction of frame area: 0.708
class(acci_hour_ppp_jit)
[1] "ppp"
plot(acci_hour_owin)

4.5 Computing STKDE by Day

acci_yday_ppp <- acci_high1 %>% 
  dplyr::select(dayofyear) %>%
  as.ppp()
acci_yday_owin <- acci_yday_ppp[p_h_owin]
summary(acci_yday_owin)
Marked planar point pattern:  234 points
Average intensity 4.341988e-06 points per square unit

Coordinates are given to 10 decimal places

marks are numeric, of type 'double'
Summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    2.0    81.5   200.5   189.1   285.0   361.0 

Window: polygonal boundary
single connected closed polygon with 973 vertices
enclosing rectangle: [676715.8, 685182.8] x [1510361.4, 1519352.4] units
                     (8467 x 8991 units)
Window area = 53892400 square units
Fraction of frame area: 0.708
any(duplicated(acci_yday_ppp))
[1] TRUE
acci_yday_ppp_jit <- rjitter(acci_yday_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE) 
any(duplicated(acci_yday_ppp_jit))
[1] FALSE

The set contains a flat point pattern of 12,986 markers with an average intensity of 1.69e-06, indicating that the points are distributed at a low density per square unit. The pattern contains duplicate points (“Pattern contains duplicated points”), which indicates that some incidents occurred at the same or very close geographic locations. In addition, the polygonal bounding area of the window covers 76,689,000 square units.

acci_yday_owin2 <- acci_yday_ppp_jit[p_h_owin]
summary(acci_yday_owin2)
Marked planar point pattern:  218 points
Average intensity 4.0451e-06 points per square unit

Coordinates are given to 10 decimal places

marks are numeric, of type 'double'
Summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00   78.75  195.00  187.72  287.75  361.00 

Window: polygonal boundary
single connected closed polygon with 973 vertices
enclosing rectangle: [676715.8, 685182.8] x [1510361.4, 1519352.4] units
                     (8467 x 8991 units)
Window area = 53892400 square units
Fraction of frame area: 0.708

We can see that the number of incident points in the processed point pattern data (Marked Planar Point Pattern) is 12986 and 12976 respectively, showing the average intensity (incident point density per square unit). The polygonal boundary of the window and the related area information are also explained in the output. The reduction in the number of points in the second result may be due to the application of the rjitter function.

kde_yday <- spattemp.density(
  acci_yday_owin2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
summary(kde_yday)
Spatiotemporal Kernel Density Estimate

Bandwidths
  h = 648.9846 (spatial)
  lambda = 21.8024 (temporal)

No. of observations
  218 

Spatial bound
  Type: polygonal
  2D enclosure: [676715.8, 685182.8] x [1510361, 1519352]

Temporal bound
  [2, 361]

Evaluation
  128 x 128 x 360 trivariate lattice
  Density range: [1.861073e-32, 1.33606e-09]

4.6 Temporal Network Spatial Point Patterns Analysis

4.6.1 Kernel density estimation is performed for each year

time_kernel_values <- acci_high1 %>%
  group_by(Year) %>%
  do({
    data = .
    samples <- seq(0, max(data$Time), 0.5)
    data.frame(
      bw_10 = tkde(data$Time, w = rep(1, nrow(data)), samples = samples, bw = 10, kernel_name = "quartic"),
      bw_20 = tkde(data$Time, w = rep(1, nrow(data)), samples = samples, bw = 20, kernel_name = "quartic"),
      bw_30 = tkde(data$Time, w = rep(1, nrow(data)), samples = samples, bw = 30, kernel_name = "quartic"),
      bw_40 = tkde(data$Time, w = rep(1, nrow(data)), samples = samples, bw = 40, kernel_name = "quartic"),
      bw_50 = tkde(data$Time, w = rep(1, nrow(data)), samples = samples, bw = 50, kernel_name = "quartic"),
      bw_60 = tkde(data$Time, w = rep(1, nrow(data)), samples = samples, bw = 60, kernel_name = "quartic"),
      time = samples,
      Year = unique(data$Year)
    )
  })
df_time <- melt(time_kernel_values, id.vars = c("time", "Year"))
df_time$variable <- as.factor(df_time$variable)
selected_months <- c("01/01", "04/01", "07/01", "10/01", "12/01")
selected_months_num <- months_starts_num[months_starts_labs %in% selected_months]
selected_months_labs <- months_starts_labs[months_starts_labs %in% selected_months]
ggplot(data = df_time) + 
  geom_line(aes(x = time, y = value, color = variable)) +  
  scale_x_continuous(breaks = selected_months_num, labels = selected_months_labs) +
  facet_wrap(~ Year, ncol = 1, scales = "free_y") +  
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 8),  
    axis.text.y = element_text(size = 8)  
  ) +
  labs(x = "Time (Months)", y = "Density", title = "Time Kernel Density Estimation for Each Year") +
  scale_color_manual(values = c("blue", "green", "red", "purple", "orange", "brown"), 
                     name = "Bandwidths", 
                     labels = c("bw_10", "bw_20", "bw_30", "bw_40", "bw_50", "bw_60"))

It is not surprising to observe that most of the accidents occur during spring in April and winter in December.

Continue the previous max density area,we add the time into analysis,which we select the April and the December.

H0:In the street network of amut Prakan, the occurrences of traffic accidents are completely random, meaning they exhibit a uniform distribution that follows the Complete Spatial Randomness (CSR) hypothesis.

H1:In the street network of amut Prakan, the occurrences of traffic accidents are not completely random, meaning they do not follow the Complete Spatial Randomness (CSR) hypothesis.

acci_road_intersection_filtered <- acci_road_intersection %>%
  filter(month(incident_datetime) %in% c(4,12))
duplicates <- st_equals(acci_road_intersection_filtered, acci_road_intersection_filtered)

acci_road_intersection_filtered <- acci_road_intersection_filtered[!duplicated(duplicates), ]
kfun_acci_max_density <- kfunctions(
  road_hd, 
  acci_road_intersection_filtered,  
  start = 0, 
  end = 5000,  
  step = 200,  
  width = 50, 
  nsim = 20,  
  resolution = 20,  
  verbose = FALSE, 
  conf_int = 0.05, 
  agg = 100000  
)
saveRDS(kfun_acci_max_density, "data/rds/kfun_acci_max_density.rds")
kfun_acci_max_density <- readRDS("data/rds/kfun_acci_max_density.rds")
kfun_acci_max_density$plotk

As can be seen from the figure, the blue solid line (empirical K function) is above the gray area, especially in the distance range of 1000 to 4000 meters, which shows that the traffic accident points show obvious spatial clustering within this range.

4.7 Combine All

We can now estimate the kernel density in both space and time for the Bangkok where has more accidence.

cv_scores <- bw_tnkde_cv_likelihood_calc(
  bws_net = seq(200, 1100, 100),  
  bws_time = seq(10, 70, 10),    
  lines = road_b,
  events = acci_b,
  time_field = "Time",
  w = rep(1, nrow(acci_b)),
  kernel_name = "quartic",
  method = "discontinuous",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 10,
  digits = 2,
  tol = 0.1,
  agg = 10,
  sparse = TRUE,
  grid_shape = c(1,1),
  sub_sample = 1,
  verbose = FALSE,
  check = TRUE)


knitr::kable(cv_scores)

According to the “leave one out cross validation” method, the optimal set of bandwidths is 1100 metres and 70 days.

# choosing sample in times 
sample_time <- seq(0, max(acci_road_intersection_filtered$Time), 10)
# calculating densities
tnkde_densities <- tnkde(lines = road_b,
                   events = acci_b_412,
                   time_field = "Time",
                   w = rep(1, nrow(acci_b_412)), 
                   samples_loc = samples,
                   samples_time = sample_time, 
                   kernel_name = "quartic",
                   bw_net = 1100, bw_time = 70,
                   adaptive = TRUE,
                   trim_bw_net = 700,
                   trim_bw_time = 60,
                   method = "discontinuous",
                   div = "bw", max_depth = 10,
                   digits = 2, tol = 0.01,
                   agg = 10, grid_shape = c(1,1), 
                   verbose  = FALSE)
saveRDS(tnkde_densities, "data/rds/tnkde_densitiesy.rds")
tnkde_densities <- readRDS("data/rds/tnkde_densitiesy.rds")
# creating a color palette for all the densities
all_densities <- c(tnkde_densities$k)
color_breaks <- classIntervals(all_densities, n = 10, style = "kmeans")
# Generate a map for each sampling time point
all_maps <- lapply(1:ncol(tnkde_densities$k), function(i){
  time <- sample_time[[i]]
  date <- as.Date(start) + time

  samples$density <- tnkde_densities$k[,i]
map1 <-  
    tm_shape(Bangkok_b) +  
    tm_borders(col = "black", lwd = 1, alpha = 0.5) + 
    tm_shape(samples) + 
    tm_dots(col = "density", size = 0.008, 
            breaks = color_breaks$brks, palette = viridis(10)) + 
    tm_layout(legend.show = FALSE, main.title = as.character(date), main.title.size = 0.5)

return(map1)
})
tmap_animation(all_maps, filename = "images/animated_map.gif", 
               width = 800, height = 800, dpi = 150, delay = 50)
knitr::include_graphics("images/animated_map.gif")

Now we can visualize the dynamic patterns of accidents across different years, days, and hours.

5 Conclusion

Using Network Spatial Point Patterns Analysis (NSPPA) and Temporal Network Spatial Point Patterns Analysis (TNSPPA) methods to study the reasons behind Bangkok’s car accidents, we can conclude the following:

Accident Clusters: There is significant spatial clustering of accidents, especially in key road intersections and major highways in Bangkok.

Temporal Variations: The temporal analysis reveals that accident rates are higher during specific months (e.g., January and April), likely due to increased traffic during holidays.

KDE Insights: Kernel Density Estimation shows shifting accident hotspots over time, reflecting traffic flow changes and potential urban development impacts.

Seasonal Factors: Changes in accident frequency could be linked to seasonal weather variations,holidays like “Songkran Festival” in April or “Christmas” in December or traffic regulations during different times of the year.

These analyses suggest targeted interventions could be developed to address the specific high-risk areas and periods to reduce accident rates.

6 Reference

1.Tin Seong Kam.Spatial Point Patterns Analysis 4.5.6.7

2.Network k Functions

3.Network Kernel Density Estimate

4.Temporal Network Kernel Density Estimate